[Chapter Nine][Previous]
[Next] [Art of
Assembly][Randall Hyde]
Art of Assembly: Chapter Nine
- 9.5 - Machine and Arithmetic Idioms
- 9.5.1 - Multiplying Without MUL and IMUL
- 9.5.2 - Division Without DIV and IDIV
- 9.5.3 - Using AND to Compute Remainders
- 9.5.4 - Implementing Modulo-n Counters
with AND
- 9.5.5 - Testing an Extended Precision
Value for 0FFFF..FFh
- 9.5.6 - TEST Operations
- 9.5.7 - Testing Signs with the XOR Instruction
9.5 Machine and Arithmetic Idioms
An idiom is an idiosyncrasy. Several arithmetic operations and 80x86
instructions have idiosyncracies that you can take advantage of when writing
assembly language code. Some people refer to the use of machine and arithmetic
idioms as "tricky programming" that you should always avoid in
well written programs. While it is wise to avoid tricks just for the sake
of tricks, many machine and arithmetic idioms are well-known and commonly
found in assembly language programs. Some of them can be really tricky,
but a good number of them are simply "tricks of the trade." This
text cannot even begin to present all of the idioms in common use today;
they are too numerous and the list is constantly changing. Nevertheless,
there are some very important idioms that you will see all the time, so
it makes sense to discuss those.
9.5.1 Multiplying Without MUL and IMUL
If you take a quick look at the timing for the multiply instruction,
you'll notice that the execution time for this instruction is rather long.
Only the div
and idiv
instructions take longer
on the 8086. When multiplying by a constant, you can avoid the performance
penalty of the mul
and imul
instructions by using
shifts, additions, and subtractions to perform the multiplication.
Remember, a shl
operation performs the same operation as multiplying
the specified operand by two. Shifting to the left two bit positions multiplies
the operand by four. Shifting to the left three bit positions multiplies
the operand by eight. In general, shifting an operand to the left n bits
multiplies it by 2n. Any value can be multiplied by some constant using
a series of shifts and adds or shifts and subtractions. For example, to
multiply the ax
register by ten, you need only multiply it
by eight and then add in two times the original value. That is, 10*ax
= 8*ax + 2*ax
. The code to accomplish this is
shl ax, 1 ;Multiply AX by two
mov bx, ax ;Save 2*AX for later
shl ax, 1 ;Multiply AX by four
shl ax, 1 ;Multiply AX by eight
add ax, bx ;Add in 2*AX to get 10*AX
The ax
register (or just about any register, for that matter)
can be multiplied by most constant values much faster using shl
than
by using the mul
instruction. This may seem hard to believe
since it only takes two instructions to compute this product:
mov bx, 10
mul bx
However, if you look at the timings, the shift and add example above requires
fewer clock cycles on most processors in the 80x86 family than the mul
instruction. Of course, the code is somewhat larger (by a few bytes),
but the performance improvement is usually worth it. Of course, on the later
80x86 processors, the mul
instruction is quite a bit faster
than the earlier processors, but the shift and add scheme is generally faster
on these processors as well.
You can also use subtraction with shifts to perform a multiplication operation.
Consider the following multiplication by seven:
mov bx, ax ;Save AX*1
shl ax, 1 ;AX := AX*2
shl ax, 1 ;AX := AX*4
shl ax, 1 ;AX := AX*8
sub ax, bx ;AX := AX*7
This follows directly from the fact that ax*7 = (ax*8)-ax
.
A common error made by beginning assembly language students is subtracting
or adding one or two rather than ax*1
or ax*2
.
The following does not compute ax*7
:
shl ax, 1
shl ax, 1
shl ax, 1
sub ax, 1
It computes (8*ax)-1, something entirely different (unless, of course, ax
= 1). Beware of this pitfall when using shifts, additions, and subtractions
to perform multiplication operations.
You can also use the lea instruction to compute certain products on 80386
and later processors. The trick is to use the 80386 scaled index mode. The
following examples demonstrate some simple cases:
lea eax, [ecx][ecx] ;EAX := ECX * 2
lea eax, [eax]eax*2] ;EAX := EAX * 3
lea eax, [eax*4] ;EAX := EAX * 4
lea eax, [ebx][ebx*4] ;EAX := EBX * 5
lea eax, [eax*8] ;EAX := EAX * 8
lea eax, [edx][edx*8] ;EAX := EDX * 9
9.5.2 Division Without DIV and IDIV
Much as the shl
instruction can be used for simulating
a multiplication by some power of two, the shr
and sar
instructions can be used to simulate a division by a power of two.
Unfortunately, you cannot use shifts, additions, and subtractions to perform
a division by an arbitrary constant as easily as you can use these instructions
to perform a multiplication operation.
Another way to perform division is to use the multiply instructions. You
can divide by some value by multiplying by its reciprocal. The multiply
instruction is marginally faster than the divide instruction; multiplying
by a reciprocal is almost always faster than division.
Now you're probably wondering "how does one multiply by a reciprocal
when the values we're dealing with are all integers?" The answer, of
course, is that we must cheat to do this. If you want to multiply by one
tenth, there is no way you can load the value 1/10th into an 80x86 register
prior to performing the division. However, we could multiply 1/10th by 10,
perform the multiplication, and then divide the result by ten to get the
final result. Of course, this wouldn't buy you anything at all, in fact
it would make things worse since you're now doing a multiplication by ten
as well as a division by ten. However, suppose you multiply 1/10th by 65,536
(6553), perform the multiplication, and then divide by 65,536. This would
still perform the correct operation and, as it turns out, if you set up
the problem correctly, you can get the division operation for free. Consider
the following code that divides ax
by ten:
mov dx, 6554 ;Round (65,536/10)
mul dx
This code leaves ax
/10 in the dx
register.
To understand how this works, consider what happens when you multiply ax
by 65,536 (10000h). This simply moves ax
into dx
and sets ax
to zero. Multiplying by 6,554 (65,536 divided
by ten) puts ax
divided by ten into the dx
register.
Since mul
is marginally faster than div
, this
technique runs a little faster than using a straight division.
Multiplying by the reciprocal works well when you need to divide by a constant.
You could even use it to divide by a variable, but the overhead to compute
the reciprocal only pays off if you perform the division many, many times
(by the same value).
9.5.3 Using AND to Compute Remainders
The and
instruction can be used to quickly compute remainders
of the form:
dest := dest MOD 2n
To compute a remainder using the and
instruction, simply and
the operand with the value 2n-1. For example, to compute ax = ax mod
8
simply use the instruction:
and ax, 7
Additional examples:
and ax, 3 ;AX := AX mod 4
and ax, 0Fh ;AX := AX mod 16
and ax, 1Fh ;AX := AX mod 32
and ax, 3Fh ;AX := AX mod 64
and ax, 7Fh ;AX := AX mod 128
mov ah, 0 ;AX := AX mod 256
; (Same as ax and 0FFh)
9.5.4 Implementing Modulo-n Counters with AND
If you want to implement a counter variable that counts up to 2n-1 and
then resets to zero, simply using the following code:
inc CounterVar
and CounterVar, nBits
where nBits is a binary value containing n one bits right justified in the
number. For example, to create a counter that cycles between zero and fifteen,
you could use the following:
inc CounterVar
and CounterVar, 00001111b
9.5.5 Testing an Extended Precision Value for 0FFFF..FFh
The and
instruction can be used to quickly check a multi-word
value to see if it contains ones in all its bit positions. Simply load the
first word into the ax
register and then logically and the
ax
register with all the remaining words in the data structure.
When the and operation is complete, the ax
register will contain
0FFFFh if and only if all the words in that structure contained 0FFFFh.
E.g.,
mov ax, word ptr var
and ax, word ptr var+2
and ax, word ptr var+4
.
.
.
and ax, word ptr var+n
cmp ax, 0FFFFh
je Is0FFFFh
9.5.6 TEST Operations
Remember, the test
instruction is an and
instruction
that doesn't retain the results of the and operation (other than the flag
settings). Therefore, many of the comments concerning the and operation
(particularly with respect to the way it affects the flags) also hold for
the test
instruction. However, since the test
instruction
doesn't affect the destination operand, multiple bit tests may be performed
on the same value. Consider the following code:
test ax, 1
jnz Bit0
test ax, 2
jnz Bit1
test ax, 4
jnz Bit3
etc.
This code can be used to successively test each bit in the ax
register
(or any other operand for that matter). Note that you cannot use the test/cmp
instruction pair to test for a specific value within a string of
bits (as you can with the and/cmp
instructions). Since test
doesn't strip out any unwanted bits, the cmp
instruction
would actually be comparing the original value rather than the stripped
value. For this reason, you'll normally use the test
instruction
to see if a single bit is set or if one or more bits out of a group of bits
are set. Of course, if you have an 80386 or later processor, you can also
use the bt
instruction to test individual bits in an operand.
Another important use of the test
instruction is to efficiently
compare a register against zero. The following test instruction sets the
zero flag if and only if ax
contains zero (anything anded with
itself produces its original value; this sets the zero flag only if that
value is zero):
test ax, ax
The test instruction is shorter than
cmp ax, 0
or
cmp eax, 0
though it is no better than
cmp al, 0
Note that you can use the and
and or
instructions
to test for zero in a fashion identical to test
. However, on
pipelined processors like the 80486 and Pentium chips, the test
instruction
is less likely to create a hazard since it does not store a result back
into its destination register.
9.5.7 Testing Signs with the XOR Instruction
Remember the pain associated with a multi-precision signed multiplication
operation? You need to determine the sign of the result, take the absolute
value of the operands, multiply them, and then adjust the sign of the result
as determined before the multiplication operation. The sign of the product
of two numbers is simply the exclusive-or of their signs before performing
the multiplication. Therefore, you can use the xor
instruction
to determine the sign of the product of two extended precision numbers.
E.g.,
32x32 Multiply:
mov al, byte ptr Oprnd1+3
xor al, byte ptr Oprnd2+3
mov cl, al ;Save sign
; Do the multiplication here (don't forget to take the absolute
; value of the two operands before performing the multiply).
.
.
.
; Now fix the sign.
cmp cl, 0 ;Check sign bit
jns ResultIsPos
; Negate the product here.
.
.
.
ResultIsPos:
- 9.5 - Machine and Arithmetic
Idioms
- 9.5.1 - Multiplying Without MUL and IMUL
- 9.5.2 - Division Without DIV and IDIV
- 9.5.3 - Using AND to Compute Remainders
- 9.5.4 - Implementing Modulo-n Counters
with AND
- 9.5.5 - Testing an Extended Precision
Value for 0FFFF..FFh
- 9.5.6 - TEST Operations
- 9.5.7 - Testing Signs with the XOR Instruction
Art of Assembly: Chapter Nine - 27 SEP 1996
[Chapter Nine][Previous]
[Next] [Art of
Assembly][Randall Hyde]